Skip to content

HDDS-14866. Enhance DiskBalancer Report to show individual volume's density#9969

Open
Gargi-jais11 wants to merge 5 commits intoapache:masterfrom
Gargi-jais11:HDDS-14866
Open

HDDS-14866. Enhance DiskBalancer Report to show individual volume's density#9969
Gargi-jais11 wants to merge 5 commits intoapache:masterfrom
Gargi-jais11:HDDS-14866

Conversation

@Gargi-jais11
Copy link
Contributor

@Gargi-jais11 Gargi-jais11 commented Mar 24, 2026

What changes were proposed in this pull request?

The DiskBalancer report today shows only:

ozone admin datanode diskbalancer report --in-service-datanodes
Report result:
Datanode                                VolumeDensity
dn-hostname-3 (10.141.248.70:19864)     0.09267551461620249
dn-hostname-1 (10.141.128.135:19864)    0.06619677701803184
dn-hostname-2 (10.141.126.8:19864)      0.026044182616493772

So users see only a single aggregate VolumeDensity per datanode, with no per-disk breakdown.
According to above report if user wants to run diskbalancer at threshold lower that 10% say at 5% it interprets that diskbalancer will start on DN-3 and DN-1. But it does not start and creates confusion that diskbalancer is not working correctly.
This is because in reality this threshold value checks wether each volumes utilisation is above or below or within the range.

Proposed Solution:
We should also show each volume's density along with the details of each volumes utilisation and pre-allocated container bytes

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14866

How was this patch tested?

Added a unit test and tested manually as well.
DiskBalancer Report before patch:

ozone admin datanode diskbalancer report --in-service-datanodes
Report result:
Datanode                                VolumeDensity
dn-hostname-3 (10.141.248.70:19864)     0.09267551461620249
dn-hostname-1 (10.141.128.135:19864)    0.06619677701803184
dn-hostname-2 (10.141.126.8:19864)      0.026044182616493772

// json
bash-5.1$ ozone admin datanode diskbalancer report --in-service-datanodes --json
[ {
  "datanode" : "dn-hostname-3 (10.141.248.70:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 0.09267551461620249
}, {
  "datanode" : "dn-hostname-1 (10.141.128.135:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 0.06619677701803184
}, {
  "datanode" : "dn-hostname-2 (10.141.126.8:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 0.026044182616493772
}
} ]

DiskBalancer Report after enhancement:

bash-5.1$ ozone admin datanode diskbalancer report --in-service-datanodes       
Report result:
Datanode: ozone-datanode-1.ozone_default (172.18.0.8:19864)
Aggregate VolumeDataDensity: 0.002000812888676179
IdealUsage: 0.1079634239173063 | Threshold: 10.0% | ThresholdRange: (0.007963423917306298, 0.20796342391730632)

Volume Details -:

StorageID                                     StoragePath                                        VolumeDensity             Utilization               Pre-Allocated Container Bytes
DS-c1e30832-dd36-444d-81fa-949c70b15b22       /data/hdds1/hdds                                   0.00087753816739934150    0.10884096208470564000    1771044864               
DS-d962ecab-8e1c-408a-87b0-b6d53456c86f       /data/hdds2/hdds                                   0.00012286827693874790    0.10808629219424505000    877658112                
DS-48547294-c3ef-40ef-a48a-14bf3b2acad8       /data/hdds3/hdds                                   0.00100040644433808950    0.10696301747296821000    0                        

-------

Datanode: ozone-datanode-2.ozone_default (172.18.0.7:19864)
Aggregate VolumeDataDensity: 1.940025425348213E-6
IdealUsage: 0.10798832091026495 | Threshold: 10.0% | ThresholdRange: (0.00798832091026494, 0.20798832091026495)

Volume Details -:

StorageID                                     StoragePath                                        VolumeDensity             Utilization               Pre-Allocated Container Bytes
DS-e8505479-f9fd-4705-b59e-c2bca68f0382       /data/hdds1/hdds                                   0.00000097001271268105    0.10798735089755226000    943718400                
DS-d0f5f9cd-920f-4ec9-9248-162ad3e1ad11       /data/hdds2/hdds                                   0.00000097001271266717    0.10798929092297761000    943718400                
DS-212374f2-15bc-4f54-b64e-30f378da456a       /data/hdds3/hdds                                   0.00000000000000000000    0.10798832091026495000    944766976                

-------

Datanode: ozone-datanode-3.ozone_default (172.18.0.9:19864)
Aggregate VolumeDataDensity: 1.2933502835515975E-6
IdealUsage: 0.10797732743285464 | Threshold: 10.0% | ThresholdRange: (0.007977327432854633, 0.20797732743285463)

Volume Details -:

StorageID                                     StoragePath                                        VolumeDensity             Utilization               Pre-Allocated Container Bytes
DS-8364a6cc-0d6b-471c-9b3f-fb2b68a6ef27       /data/hdds1/hdds                                   0.00000064667514178274    0.10797668075771286000    968884224                
DS-e9705dfb-f66d-4b21-a619-7c41c53f7cd6       /data/hdds2/hdds                                   0.00000032333757088443    0.10797765077042552000    966787072                
DS-c64998b1-8a10-4506-b358-e1ae278ed71c       /data/hdds3/hdds                                   0.00000032333757088443    0.10797765077042552000    966787072                


Note:
  - Aggregate VolumeDataDensity: Sum of per-volume density (deviation from ideal); higher means more imbalance.
  - IdealUsage: Target utilization ratio (0-1) when volumes are evenly balanced.
  - ThresholdRange: Acceptable deviation (percent); volumes within IdealUsage +/- Threshold are considered balanced.
  - VolumeDensity: Deviation of a particular volume's utilization from IdealUsage.
  - Utilization: Ratio of actual used space to capacity (0-1) for a particular volume.
  - Pre-Allocated Container Bytes: Space reserved for containers not yet written to disk.
  
  // json output
  
  bash-5.1$ ozone admin datanode diskbalancer report --in-service-datanodes --json
[ {
  "datanode" : "ozone-datanode-5.ozone_default (172.18.0.9:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 4.1149758045477824E-6,
  "idealUsage" : 0.0714138324361701,
  "threshold %" : 10.0,
  "thresholdRange" : "(-0.02858616756382991000, 0.17141383243617010000)",
  "volumes" : [ {
    "storageId" : "DS-40df522d-9d27-484f-a8a8-08496b71ce07",
    "storagePath" : "/data/hdds1/hdds",
    "volumeDensity" : 2.0574879022738912E-6,
    "utilization" : 0.07141588992407237,
    "pre-Allocated container bytes" : 0
  }, {
    "storageId" : "DS-b1b8a81e-bc33-4d1d-a3f6-5b744a156ddc",
    "storagePath" : "/data/hdds2/hdds",
    "volumeDensity" : 2.0574879022738912E-6,
    "utilization" : 0.07141177494826782,
    "pre-Allocated container bytes" : 0
  } ]
}, {
  "datanode" : "ozone-datanode-4.ozone_default (172.18.0.11:19864)",
  "action" : "report",
  "status" : "success",
  "volumeDensity" : 0.0,
  "idealUsage" : 0.07143691949655417,
  "threshold %" : 10.0,
  "thresholdRange" : "(-0.02856308050344583000, 0.17143691949655418000)",
  "volumes" : [ {
    "storageId" : "DS-4a507b71-7f94-4af8-a8b9-0c0d959cb98c",
    "storagePath" : "/data/hdds1/hdds",
    "volumeDensity" : 0.0,
    "utilization" : 0.07143691949655417,
    "pre-Allocated container bytes" : 0
  }, {
    "storageId" : "DS-28f69ea7-c11e-4dc3-8a2b-245ebc5fe584",
    "storagePath" : "/data/hdds2/hdds",
    "volumeDensity" : 0.0,
    "utilization" : 0.07143691949655417,
    "pre-Allocated container bytes" : 0
  } ]
} ]


@Gargi-jais11 Gargi-jais11 marked this pull request as ready for review March 24, 2026 07:14
@sreejasahithi
Copy link
Contributor

@Gargi-jais11 , can you please add the before and after of the json output.

Copy link
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Gargi-jais11 for working on this , left few comments

@Gargi-jais11
Copy link
Contributor Author

@ChenSammi Please review this PR.

@ChenSammi
Copy link
Contributor

@Gargi-jais11 , why we only show "Pre-Allocated Container Bytes" for usage detail? Imaging you are the user which is in doubt why no container is moved, what will be your steps to investigate the issue depending on these new outputs? Which outputs will be most useful?

@Gargi-jais11
Copy link
Contributor Author

@ChenSammi utilisation shown for each dn is definitly what we need the user to understand but if we see the dn ui used space shown for each volume is always less than actual used as it contains pre allocated and reserved as well. So I am showing this pre allocated because it will make the suer understand why the used bytes show is suppose 3GB with that volume capacity of 10GB but has utilisation of 50% because 2GB is occupied by pre allocated.

If you suggest may be we can also add the usedBytes for that volume which can clearly show the user that usedBytes + pre-allocated = utilisation for that dn.

@Gargi-jais11
Copy link
Contributor Author

2026-03-06 12:27:02,444 DEBUG [DiskBalancerService#2]-org.apache.hadoop.ozone.container.diskbalancer.policy.DefaultVolumeChoosingPolicy: Disk balancing state - idealUsage=0.5094568501, thresholdPercentage=10.0%, thresholdRange=(0.4094568501, 0.6094568501), containerSize=5368709120
2026-03-06 12:27:02,444 DEBUG [DiskBalancerService#2]-org.apache.hadoop.ozone.container.diskbalancer.policy.DefaultVolumeChoosingPolicy: Volume[0] - disk=DS-bb305d7f-d780-4832-962b-1174fc11757d, utilization=0.5027842722, capacity=53681722491, effectiveUsed=26990325773, available=53534941184, usableSpace=5216560238, committedBytes=26843544466, delta=0
2026-03-06 12:27:02,444 DEBUG [DiskBalancerService#2]-org.apache.hadoop.ozone.container.diskbalancer.policy.DefaultVolumeChoosingPolicy: Volume[1] - disk=DS-baa06cd6-702b-49fc-9887-e241bce06863, utilization=0.5027846478, capacity=53681722491, effectiveUsed=26990345938, available=53534920704, usableSpace=5216540073, committedBytes=26843544151, delta=0
2026-03-06 12:27:02,445 DEBUG [DiskBalancerService#2]-org.apache.hadoop.ozone.container.diskbalancer.policy.DefaultVolumeChoosingPolicy: Volume[2] - disk=DS-ef78e643-5126-4fa0-8c30-8ff15564f77f, utilization=0.5228016301, capacity=53681722491, effectiveUsed=28064892027, available=25616830464, usableSpace=4141993984, committedBytes=0, delta=0
2                       

Let's take help of above example is any user is giving 10% threshold than it won't start using the threshold range we can say that all have within the range so no movement as per the debug log
But then what user sees from the dn ui for each volume utilisation is :

Disk1-50%
Disk2-1%
Disk3-1%

So giving threshold 10% user assumes it should start but if we have utilisation for each shown that would be great as it shows the clear picture. so as per above log all dn utilisation is somewhat around 52%:

Disk1-50%
Disk2-50%
Disk3-52%

So user will ask why utilisation is high even if it shows very less used bytes on aprticular voolume that's why added pre allocated container bytes to tell this is the one which is also contributing for used bytes and utilisation to be high.
I hope this clears why I am also adding pre-allocated bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants